An Interactive Spreadsheet For Teaching The Forward-Backward Algorithm

نویسنده

  • Jason M. Eisner
چکیده

This paper offers a detailed lesson plan on the forwardbackward algorithm. The lesson is taught from a live, commented spreadsheet that implements the algorithm and graphs its behavior on a whimsical toy example. By experimenting with different inputs, one can help students develop intuitions about HMMs in particular and Expectation Maximization in general. The spreadsheet and a coordinated follow-up assignment are available. 1 Why Teach from a Spreadsheet? Algorithm animations are a wonderful teaching tool. They are concrete, visual, playful, sometimes interactive, and remain available to students after the lecture ends. Unfortunately, they have mainly been limited to algorithms that manipulate easy-to-draw data structures. Numerical algorithms can be “animated” by spreadsheets. Although current spreadsheets do not provide video, they can show “all at once” how a computation unfolds over time, displaying intermediate results in successive rows of a table and on graphs. Like the best algorithm animations, they let the user manipulate the input data to see what changes. The user can instantly and graphically see the effect on the whole course of the computation. Spreadsheets are also transparent. In Figure 1, the user has double-clicked on a cell to reveal its underlying formula. The other cells that it depends on are automatically highlighted, with colors keyed to the references in the formula. There is no programming language to learn: spreadsheet programs are aimed at the mass market, with an intuitive design and plenty of online help, and today’s undergraduates already understand their basic operation. An adventurous student can even experiment with modifying the formulas, or can instrument the spreadsheet with additional graphs. Finally, modern spreadsheet programs such as Microsoft Excel support visually attractive layouts with integrated comments, color, fonts, shading, and Figure 1: User has double-clicked on cell D29. drawings. This makes them effective for both classroom presentation and individual study. This paper describes a lesson plan that was centered around a live spreadsheet, as well as a subsequent programming assignment in which the spreadsheet served as a debugging aid. The materials are available for use by others. Students were especially engaged in class, apparently for the following reasons: • Striking results (“It learned it!”) that could be immediately apprehended from the graphs. • Live interactive demo. The students were eager to guess what the algorithm would do on particular inputs and test their guesses. • A whimsical toy example. • The departure from the usual class routine. • Novel use of spreadsheets. Several students who thought of them as mere bookkeeping tools were awed by this, with one calling it “the coolest-ass spreadsheet ever.” 2 How to Teach from a Spreadsheet? It is possible to teach from a live spreadsheet by using an RGB projector. The spreadsheet’s zoom feature can compensate for small type, although undergraduate eyes prove sharp enough that it may be unnecessary. (Invite the students to sit near the front.) Of course, interesting spreadsheets are much too big to fit on the screen, even with a “View / Full Screen” command. But scrolling is easy to follow if it is not too fast and if the class has previously been given a tour of the overall spreadsheet layout (by scrolling and/or zooming out). Split-screen features such as hide rows/columns, split panes, and freeze panes can be moderately helpful; so can commands to jump around the spreadsheet, or switch between two windows that display different areas. It is a good idea to memorize key sequences for such commands rather than struggle with mouse menus or dialog boxes during class. 3 The Subject Matter Among topics in natural language processing, the forward-backward or Baum-Welch algorithm (Baum, 1972) is particularly difficult to teach. The algorithm estimates the parameters of a Hidden Markov Model (HMM) by ExpectationMaximization (EM), using dynamic programming to carry out the expectation steps efficiently. HMMs have long been central in speech recognition (Rabiner, 1989). Their application to partof-speech tagging (Church, 1988; DeRose, 1988) kicked off the era of statistical NLP, and they have found additional NLP applications to phrase chunking, text segmentation, word-sense disambiguation, and information extraction. The algorithm is also important to teach for pedagogical reasons, as the entry point to a family of EM algorithms for unsupervised parameter estimation. Indeed, it is an instructive special case of (1) the inside-outside algorithm for estimation of probabilistic context-free grammars; (2) belief propagation for training singly-connected Bayesian networks and junction trees (Pearl, 1988; Lauritzen, 1995); (3) algorithms for learning alignment models such as weighted edit distance; (4) general finitestate parameter estimation (Eisner, 2002). Before studying the algorithm, students should first have worked with some if not all of the key ideas in simpler settings. Markov models can be introduced through n-gram models or probabilistic finite-state automata. EM can be introduced through simpler tasks such as soft clustering. Global optimization through dynamic programming can be introduced in other contexts such as probabilistic CKY parsing or edit distance. Finally, the students should understand supervised training and Viterbi decoding of HMMs, for example in the context of partof-speech tagging. Even with such preparation, however, the forward-backward algorithm can be difficult for beginning students to apprehend. It requires them to think about all of the above ideas at once, in combination, and to relate them to the nitty-gritty of the algorithm, namely • the two-pass computation of mysterious α and β probabilities • the conversion of these prior path probabilities to posterior expectations of transition and emission counts Just as important, students must develop an understanding of the algorithm’s qualitative properties, which it shares with other EM algorithms: • performs unsupervised learning (what is this and why is it possible?) • alternates expectation and maximization steps • maximizes p(observed training data) (i.e., total probability of all hidden paths that generate those data) • finds only a local maximum, so is sensitive to initial conditions • cannot escape zeroes or symmetries, so they should be avoided in initial conditions • uses the states as it sees fit, ignoring the suggestive names that we may give them (e.g., part of speech tags) • may overfit the training data unless smoothing is used The spreadsheet lesson was deployed in two 50minute lectures at Johns Hopkins University, in an introductory NLP course aimed at upper-level undergraduates and first-year graduate students. A single lecture might have sufficed for a less interactive presentation. The lesson appeared in week 10 of 13, by which time the students had already been exposed to most of the preparatory topics mentioned above, including Viterbi decoding of a part-of-speech trigram tagging model. However, the present lesson was their first exposure to EM or indeed to any kind of unsupervised learning. Figure 2: Initial guesses of parameters. Figure 3: Diary data and reconstructed weather. 4 The Ice Cream Climatology Data [While the spreadsheet could be used in many ways, the next several sections offer one detailed lesson plan. Questions for the class are included; subsequent points often depend on the answers, which are concealed here in footnotes. Some fragments of the full spreadsheet are shown in the figures.] The situation: You are climatologists in the year 2799, studying the history of global warming. You can’t find any records of Baltimore weather, but you do find my diary, in which I assiduously recorded how much ice cream I ate each day (see Figure 3). What can you figure out from this about the weather

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Forward-Backward Projection Algorithm for Approximating of the Zero of the ‎S‎um of ‎T‎wo Operators

‎I‎n this paper‎, ‎a‎ forward-‎b‎ackward projection algorithm is considered for finding zero points of the sum of two operators‎ ‎in Hilbert spaces‎. ‎The sequence generated by algorithm converges strongly to the zero point of the sum of an $alpha$-inverse strongly‎ ‎monotone operator and a maximal monotone operator‎. ‎We apply the result for solving the variational inequality problem, fixed po...

متن کامل

Backward and forward path following control of a wheeled robot

A wheeled mobile robot is one of the most important types of mobile robots. A subcategory of these robots is wheeled robots towing trailer(s). Motion control problem, especially in backward motion is one of the challenging research topics in this field. In this article, a control algorithm for path-following problem of a tractor-trailer system is provided, which at the same time provides the ab...

متن کامل

A Flexible Integrated Forward/ Reverse Logistics Model with Random Path-based Memetic Algorithm

Due to business and environmental issues, the efficient design of an integrated forward/reverse logistics network has recently attracted more attention from researchers. The significance of transportation cost and customer satisfaction spurs an interest in developing a flexible network design model with different delivery paths. This paper proposes a flexible mixed-integer programming model to ...

متن کامل

The Effect of Using Interactive Spreadsheet as a Demonstrative Tool in the Teaching and Learning of Mathematics Concepts

This investigation studied pre-service teachers’ development of reform oriented teaching through the use of interactive spreadsheet a demonstrative teaching tool in the learning of secondary school mathematics concepts. A qualitative analysis of various data sources (written lesson plans and reflections, observations, and interviews) for 8 participants working in teams of two was conducted. Fin...

متن کامل

B-Mode Photoacoustic Imaging using Linear Array: Numerical Study for Forward-Backward Minimum Variance Beamformer Combined with Delay-Multiply-and-Sum

Photoacoustic imaging (PAI) is a promising medical imaging modality which provides the resolution of Ultrasound (US) and the contrast of Optical imaging modalities. One of the most important challenges in PAI is image formation, especially in the case that a linear-array US transducer is used for data acquisition. This is due to the fact that in the linear-array scenario, there is only 60 degre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002